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Abstract. An occupancy problem with an infinite number of bins and a 
random probability vector for the locations of the balls is considered. The 
respective sizes of bins are related to the split times of a Yule process. The 
asymptotic behavior of the landscape of first empty bins, i.e., the set of corre- 
sponding indices represented by point processes, is analyzed and convergences 
in distribution to mixed Poisson processes are established. Additionally, the 
influence of the random environment, the random probability vector, is an- 
alyzed. It is represented by two main components: an i.i.d. sequence and a 
fixed random variable. Each of these components has a specific impact on the 
qualitative behavior of the stochastic model. It is shown in particular that for 
some values of the parameters, some rare events, which are identified, play an 
important role on average values of the number of empty bins in some regions. 
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1. Introduction 

Occupancy schemes in terms of bins and balls offer a very flexible and elegant 
way to formulate various problems in computer science, biology and applied mathe- 
matics for example. One of the earliest models investigated in the literature consists 
in throwing m balls at random into n identical bins. Asymptotic behavior of occu- 
pancy variables have been analyzed when n grows to infinity, with different scalings 
in n for the variable m. The books by Johnson and Kotz [TT] and Kolchin et al. [T3] 
are classical references on this topic. See also Chapter 6 of Barbour et al. |3 for a 
recent presentation of these problems. 

An extension of these models is when there is an infinite number of bins and a 
probability vector (p„) on N describing the way balls are sent: for n > 0, p„ is the 
probability that a ball is sent into the nth bin. In one of the first studies in this 
setting, Karlin [12] analyzed the asymptotic behavior of the number of occupied 
bins. More recently Hwang and Janson [lOj proves in a quite general framework 
central limit results for these quantities. In this setting, some additional variables 
are also of interest like the sets of indices of occupied or empty bins, adding a 
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geometric component to these problems. For specific probability vectors Csaki 
and Foldes [4] and Flajolet and Martin [6j investigated the index of the first empty 
bin. See the recent survey Gnedin et al. 'W for more references on the occupancy 
problem with infinitely many bins. 

A further extension of these stochastic models consists in considering random 
probability vectors. Gnedin [7J (and subsequent papers) analyzed the case where 
(pn) decays geometrically fast according to some random variables, i.e., for n > 1, 
Pn — nr^Ti — Yn) where (1^) are i.i.d. random variables on (0,1). Various 
asymptotic results on the number of occupied bins in this case have been obtained. 
The random vector can be seen as a "random environment" for the bins and balls 
problem, it complicates significantly the asymptotic results in some cases. In par- 
ticular, the indices of the urns in which the balls fall are no longer independent 
random variables as in the deterministic case. 

The general goal of this paper is to investigate in detail the impact of this ran- 
domness for a bins and balls problem associated to a Yule process, see Athreya and 
Ney ^ for the definition of a Yule process. This (quite natural) stochastic model 
has its origin in network modeling, see Simatos et al. [23j for a detailed presentation. 
It can be described as follows: the non-decreasing sequence (t„) of split times of 
the Yule process defines the bins, the nth bin, n > 1, being the interval (i„_i,i„]. 
The locations of balls are represented by independent exponential random variables 
with parameter p. The main problem investigated here concerns the asymptotic 
description of the set of indices of first empty bins when the number of balls goes 
to infinity. Mathematically, it is formulated as a convergence in distribution of 
rescaled point processes having Dirac masses at the indices of empty bins. 

For n > 1, if P„ is the probability that a ball falls into the nth bin, it is easily 
seen that, for a large n, P„ has a power law decay, it can be expressed as VEn/nP^"^ 
where {En) are i.i.d. exponential random variables with parameter 1 and V some 
independent random variable related to the limit of a martingale. The randomness 
of the probability vector (P„) has two components: one which is a part of an i.i.d. 
sequence, changing from one bin to another, and the other being "fixed once for 
all" inducing a dependency structure. As it will be seen, the two components have 
separately a significant impact on the qualitative behavior of this model. 

Convergence in Distribution and Rare Events. Because the variables {En) 
can be arbitrarily small with positive probability, empty bins are likely to be cre- 
ated earlier (i.e., with smaller indices) than for a deterministic probability vector 
with the same power law decay. It is shown in fact that, for the convergence in dis- 
tribution, the first empty bins occur around indices of the order of n^/^P^"^^ instead 
of (n/ logn)^/*^^"'"''-* in the deterministic case. 

The variable V has a more subtle impact, when p > 1 it is shown that, due 
to some heavy tail property of , rare events affect the asymptotic behavior of 
averages of some of the characteristics. For a e [l/(2p-|- 1), l/(p-|- 2)), despite that 
the number of empty bins with indices of order n" converges in distribution to 0, the 
corresponding average converges to +oo. When p < 1, the average is converging to 
0. A phase transition phenomenon at p = 1 has been identified through simulations 
in a related context, communication networks, in Saddi and Guillemin [22j . It is 
not apparent as long as convergence in distribution is concerned but it shows up 
when average quantities are considered. This phenomenon is due to rare events 
related to the total size of the [pj first bins: On these events, the indices of the first 
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empty bins are of the order n^/(^P+^) <c n^/'^P+^) and a lot them are created at this 
occasion. See Proposition [6] and Corollary [2] for a precise statement of this result. 
Concerning the generality of the results obtained, it is believed that some of them 
hold in a more general setting, for the underlying branching process for example, 
see Section |6l 

Point Processes. Technically, one mainly uses point processes on M+ to describe 
the asymptotic behavior of the indices of the first empty bins and not only the index 
of the first one (or the subsequent ones) as it is usually the case in the literature. 
It turns out that it is quite appropriate in our setting to get a full description of 
the set of the first empty bins and, moreover, it reduces the technicalities of some 
of the proofs. One of the arguments for the proofs of the convergence results is 
a simple convergence result of two-dimensional point processes to Poisson point 
process with some intensity measure. A one-dimensional equivalent of this point of 
view is implicit in most of the papers of the literature, in Hwang and Janson [10| in 
particular. See Robert and Simatos jOj for a presentation of an extension of this 
approach in a more general framework. 

The paper is organized as follows. Section [2] introduces the stochastic model 
investigated. The main results concerning convergence of related point processes 
in are presented in Section [3] Convergence results for the indices of empty bins 
are proved in Section [D Section [5] investigates in detail the case p > 1. Section [H] 
presents some possible extensions. 

2. A Bins and Balls Problem in Random Environment 
The stochastic model is described in detail and some notations are introduced. 

The Bins. Let (Ei) be a sequence of i.i.d. exponential random variables with 
parameter 1. Define the non-decreasing sequence (t„) by, for n > 1, 

n ^ 
tn = / ^ -Ei. 

4=1 

It is easy to check that for a; > 0, 

(1) P(i„ <x)= P(max(£;i, £;2, . . . ,E^) < x) = {\ - e"")". 

The nth bin will be identified by the interval (in_i, in]. 

If Hn — l-fl/2+--- + l/nis the nth harmonic number, since (i^ — Hn) is a 
square integrable martingale whose increasing process is given by 

n 

E((i„-i?„)^)=5:-^, 

i=l 

then (A/„) (<„ — logn) is almost surely converging to some finite random variable 
Moo. See Neveu [ITj or Williams [21]. By using Equation (IT|), it is not difficult to 
get that the distribution of Moo is given by 

(2) P(A/oo <x)^ exp (-e""=) , a; e M. 

An alternative description of the sequence is provided by the split times of a 
Yule (branching) process starting with one individual. See Athreya and Ney [5] . 
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The Balls. The locations of the balls are given by an independent sequence (Bj) 
of i.i.d. exponential random variables with parameter p for some p > 0. 

Conditionally on the point process associated with the location of bins, the 
probability that a given ball falls into the nth bin (t„_i,t„] is given by 



This quantity can be rewritten as 

(3) Pn = ^WPZn, with Z„ = n (l - e-''^"/") and W„ - e"*^-! . 

The variables Wn and Z„ are independent random variables with different behavior. 

(1) The variables (Zn) are independent and converge in distribution to an ex- 
ponentially distributed random variable with parameter 1/p. 

(2) The random variables (Wn) converge almost surely to the finite random 
variable Woo = exp(— Moo) which is exponentially distributed with param- 
eter 1. 

This suggests an asymptotic representation of the sequence (Pn) as 

(4) Pn - jn^SoEr., 

where (£'«) is an i.i.d. sequence of exponential random variables with mean p inde- 
pendent of Woo- The sequence (Pn) has a power law decay with a random coefficient 
consisting of the product of two terms: a fixed random variable W^^ and the other 
being an element of an i.i.d. sequence. As it will be seen, these two terms have a 
significant impact on the bins and balls problem studied in this paper. 



3. Convergence of Point Processes 

One of the main result. Theorem [2] in the next section, which establishes con- 
vergence results for the indices of the first empty bins is closely related to the 
asymptotic behavior of the point process {{i/n^^^'^^''\nPi), i > 1} on M^. For this 
reason, some results on convergence of point processes in are first proved. The 
point process associated to the (nPi) appears quite naturally, especially in view 
of the Poisson transform used in the proof of Theorem [21 This is also a central 
variable in Hwang and Janson [TU] in some cases. 

An important tool to study point processes in for some > 1 is the Laplace 
transform: If Af—{tn,n > 1) is a point process and / a function in C+(]R.'^), the 
set of non- negative continuous functions with a compact support, it is defined as 
E(exp(-A/'(/))), where 

AA(/)=--E 

n>l 

This functional uniquely determines the distribution of Af and it is a key tool to 
establish convergence results. See Neveu [TH] and Dawson [5] for a comprehensive 
presentation of these questions. In the following, the quantity Af{A) denotes the 
number of tn 's in the subset A of M^j. . 

The main results of this section establish convergence in distribution to mixed 
Poisson point processes, i.e., distributed as a Poisson point process with a param- 
eter which is a random variable. A natural tool in this domain is the Chen-Stein 
approach which gives the convergence in distribution and, generally, quite good 
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bounds on the convergence rate. See Chapter 10 of Barbour et al. [3] for example. 
This has been used in Simatos et al. [23 , when the probabiHty vector is determin- 
istic. For some of the results of this section, this approach can probably also be 
used. Unfortunately, due to the almost surely converging sequence (Wn) creating 
a dependency structure, it does not seem that the main convergence result. The- 
orem [U can be proved in a simple way by using Chen-Stein's method. The main 
problem being of conditioning on the variable Woo and keeping at the same time 
upper bounds on the total variation distance converging to 0. 

Condition C. A sequence of independent random variables {Xi) satisfies Condi- 
tion C if there exist some a > and 77 > such that, for alH > 1, 

(5) |P(Xj < x) - ax\ < Ca;^ when < x < 77. 

The following proposition is a preliminary result that will be used to prove the main 
convergence results for the indices of the first empty bins. 

Proposition 1 (Convergence to a Poisson process). For d > and n > 1, let Vn 

he the point process on R3_ defined by 



ni/is+i) ' iS 



X, 



,t> 1 



where (Xi) a sequence of non-negative independent random variables satisfying Con- 
dition C. Then the sequence of point processes (Vn) converges in distribution to a 
Poisson point process V in with intensity measure x^dx dy on . In particular, 
its Laplace transform is given by 

(6) E(exp[-7'(/)]) = exp {^a J ^ (l - g-^^^-^')) xUxdy^ , / G C+(R^). 

See Robert [TQj for the definition and the main properties of Poisson processes in 
general state spaces. 

Proof. There exists some 770 > such that P{Xi < x) < 2ax for < a; < 770 and all 
i> 1. Let / G C+(M^) be such that / is differentiable with respect to the second 
variable. There is some i^T > so that the support of / is included in [0, K] x [0, K], 
define g{x, y) = 1 — exp(— /(x, y)), then by independence of the variables Xi, i > 1, 



lO] 



gE(e-^'.(/))=^log 1-E 



^X, 



Since 



E 



9 



I n 

7W+TT' 7*^' 



< 



X^ < K- 

n 



^{j<i<-ni/(*+i)}, 



the elementary inequality | log(l— -I- y| < 37/^/2 valid for < 7/ < 1/2 shows that 
there exists some tiq > 1 such that 



logE(e-^"(/)) +^E 



i=l 



„l/(5+l) ' j<5 

^ Q{aK) 



,X, 



2 [Kn^^^'+^^i 



E 



„-2<5 



< 6a^K 



2 7^2,5+3 



1 

ll/(<5+l) 
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holds for n > uq. It is therefore enough to study the asymptotics of the series of 
the left hand side of the above inequality. For x > 0, by using Fubini's Theorem, 
one gets 

E (,g (^x,^X,))=-£ ^ (x, y)¥ {X, < yi' /n) dy. 

By using again Condition C, one obtains that the log of the Laplace transform of 
Vn has the same asymptotic behavior as 

1 ^ r+^dg ( z \ 

which is a Riemann sum converging to 

/ Tr-{x,y)yx^ dxdy ^ a [ (l ~ 6^^'''^'^^ x^ dxdy. 

JrI cly JwLi ^ ' 

This shows in particular that for any compact set Ji of , then 

supECP„(-ff)) < +00, 

n>l 

the sequence (7^„) is therefore tight for the weak topology, see Dawson [5]. 

By the convergence result, if V is any limiting point of the sequence (Pn), for 
any function / g C+(]R^) such that y f{x, y) is differentiable, then the Laplace 
transform of 7-" at / is given by the right hand side of Equation ^ . By density of 
these functions / in C+(]R^) for the uniform topology, this implies that V is indeed 
a Poisson point process with intensity measure x^ dxdy on . The proposition is 
proved. □ 

The above result can be (roughly) restated as follows: for the indices of the order 
of v}/''^^^\ the points nXi/i^ ^ lying in some finite fixed interval converge to an 
homogeneous Poisson point process. The following proposition gives an asymptotic 
description of the indices of the points nXi/i^ but for indices of the order of n'^ 
with K. > l/{5 + 1). It shows that, on finite intervals, these points accumulate at 
rate rS^^^^'^^^ according to the Lebesgue measure with some density. 

Proposition 2 (Law of Large Numbers). //, for k > 1/(1 + S) and for n G N, 

is the point process on R4. defined by 

+00 , . ^ 

i—l ^ ^ 

where (Xi) is a sequence of non-negative independent random variables satisfying 
Condition C, then the sequence iV!^) converges in distribution to the deterministic 
measure defined by 

-P^if) = « / fix,y)x' dxdy, f e C+{Rl). 

Proof. Let / G C+(K^) be such that / is differentiable with respect to the second 
variable. As before, the convergence result is proved for such a function /, the 
generalization to an arbitrary function / £ C^(M^) follows the same lines as the 
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previous proof (relative compactness argument and identification of the limit). Let 
K > Q such that the support of / is included in [0, K] x [0, K]. One has 



as in the previous proof, by using Condition ([5]), one gets that 



+ 00 



E(K(/))--«-,E 



i=l 



df f I 



dy yn" 



y y 



dy, 



therefore, 



lim E(P,':(/)) = -a / / jf{x,y)yx'dxdy^a f{x,y)x'dxdy. 

"^+°° Jo Jo "y Jri 

By independence of the Xi 's the second moment of the difference 

K(/)-IE(K(/)) 

= E d^y^^^y) [^{x.<y^'/n} - < yi'/n)] dy, 

can be expressed as 







= EiE 

4 = 1 

^^E 

i=l 

+00 

^^E 



+ 00 



df ( I 







dy \n 
df 



-I 2N 







dy \n'^ 

df f i 



-,y] [^{Xi<yt^/n} - ^{Xi < yi^ /n)] dy j 



,y 



dy \n'^ 



:,y 



< yt^/n) dy. 



by Cauchy-Shwartz's Inequality. The last term is, with the same arguments as for 
the asymptotics of E ('P^ (/)), equivalent to 

-I 2 



df , ; 



yx dxdy. 



In particular, the sequence ('P^(/)) converges in L2 (and therefore in distribution) 
to V^if)- The proposition is proved. 

The main convergence result can now be established. □ 

Theorem 1. //, for n > 1, Vn is the point process on defined by 

i 



,nP, ],i>l 



then the sequence (Vn) converges in distribution and the relation 

( w^p 



(7) lim E (e-^"^^A : 
holds for any f e C+{Rl). 



cxp 



(^1 _ e-^("'^)) xP+Uxdy 
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In other words the point process Vn converges in distribution to a mixed Poisson 
point process: conditionaUy on T4^oo, it is a Poisson process with intensity measure 
W^PxP+'^ dx dy/p. 

Proof. The proof proceeds in several steps. The main objective of these steps 
is to decouple the sequences (Wi) and (Zi) defining the [Pi) and then to apply 
Proposition [TJ 



Step 1. One defines the sequences 
1 



P 



i > 1, 



Pf 



i > 1, 



where (/3„) is some sequence of integers converging to +00. The sequences of 
random variables {Wi, 1 < i < +00) and (Zi) are assumed to be independent and 
to have, respectively, the same distribution as (Wi, 1 < i < +00) and (Zi) defined 
by Equation Q. Recall that the sequence (Wi) converges almost surely to Woo- 
These sequences define point processes in the following way, for j — 1 and 2, 



?ii/(p+2: 



■,nP^ 



,i> 1 



If / is a non-negative continuous function with compact support on Ml, because, 



conditionally on Woo, the variables (WooZi) satisfy Condition C with a = W^^/p, 
Proposition [U with S = p + 1, shows that 



Hm E (e 

n—>-\-oo \ 



-Kif) 



Woo = exp 



Because of the boundedness of these quantities, by Lebesgue's Theorem, the same 
result holds for the expected values. Therefore, the sequence (V^) converges in 
distribution to the point process V on whose Laplace transform is given by 
Equation 

Let > 2 be such that the support of / is a subset of [0, K]'^ and e > . Since 
the limiting point process V is almost surely a Radon measure, there exists some 
m e N such that P('Pl{[0,2Kf) > m) < e for all n > 1. By uniform continuity, 
there exists < < 1/2 such that \f(u) — f{v)\ < e/m for u, v & such that 

11"" ^ ''^ll l£ V- For n > 1, if 

A m^jWP, - 1| > V/2K} U {Pi([0, 2if]2) > m} 

then 

|E (exp [-r^M)] ) - E (exp [~V'M)] ) | < n^) 



-E 



exp 



E 

i>i 



f 



„l/(p+2) ' 



■nPl - f 



, nP} 



< 



ni/(p+2) 
(\W;jW^-l\>v/2K) + 2e, 



1 1, 



hence, by the almost sure convergence of (Wn) to Woo, the right hand side of the 
last relation can be arbitrarily small as n goes to infinity. One concludes that the 
sequence (V^) also converges in distribution to the point process V. 
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Step 2. For n > 1, define 



n 



/logn 



then it will be shown that the point processes 

i 



Qn = 



,nP^] ,l<i< (3r^ 



converge to the measure identically null. It is sufficient to prove that for any 
/ G C+(M_|_), the sequence (Q„(/)) converges in distribution to 0. For a fixed i, 
the sequence (nP,) converges in distribution to infinity, since / is continuous with 
compact support and therefore bounded, one obtains that, in the definition of 
it can be assumed that the indices i are restricted to the set { [p] , . . . , /?„}. 

Let K be such that the support of / is included in [0, K]"^, if Un = log logn, for 
i> \P], 

E(/(Vn^/(''+2\nP,)l{tLpj<«„}) < ll/llooP(tL.J <Un,nPi<K), 
since Pi = e-P*LpJ e-''(**-i-*Lpj) (l - e-P^'/'), 



E 



(/(z/ni/(''+2),nP.)l{n.j<«.}) 



< 



1 - e 



-pE,/i 



p/i 



P 



By using the elementary inequality, if Ei is exponentially distributed with mean 1, 



(8) 



(i(l-e--0<-)<e(l-e-i. 



y<l,a;>0. 



one gets that, for i > p, 

<e||/||ooE( 1-exp 



np 



<ei^||/||^- E 



np 



eKWfW 



np 



— Cj , 



Thus, there exists some finite constant C such that, for i > p. 
consequently, 

E(2n(/)l{tLpJ<».}) 

This relation and the inequality 

E (l - e-2'.(/)) < P(tLpj > Un)+E(^QMn{t,,,<u^}) 
give the desired result. 



(logn) 2 
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Step 3. The proof of the theorem can be now completed. By Equation for i > 1, 
Pi = W[ Zi/i''^^ , by using Step 2 and the same techniques as in Step 1 together 
with the fact that, for 77 > 0, the probabiUty of the event 



{sup(|l^7Ty^ 



1 



i > 



/3n) > V} 



converges to as n gets large, it is not difficult to show that the sequences of point 
processes 



i > 1> and 



have the same limit in distribution. Because is independent of {Zi,i > the 
last point process has the same distribution as (up to the first /3„ terms which 
are negligible similarly as in Step 2). By Step 1, the convergence in distribution is 
therefore proved. □ 

The following proposition strengthens the statement of Proposition [Tl it will be 
used to prove the main asymptotic result on the indices of empty bins. 



Proposition 3. // / : is a continuous function such that 

(1) there exists K such that f{x,y) = for any x < K and y G 

(2) for all X G M^, the function y f{x,y) is differentiable and 



df 



dy 



def. 



y sup 



df 



dy 



ix,y) 



is integrable on K_|_ , 
then Convergence ([7]) also holds for f . 

Proof. For M, L > and i, n E N, one has 



r+oc Qj 



dy 



P(Af <nP,<y,typ^ <L)dy. 



By using similar arguments as in the end of the proof of the above theorem, one 
gets 



E / 



nPi j l{nPi>Af,tLpj<L} 
n+00 



< e 



< 



16' 



dy 



E 1 - exp 



pL p(ti- 



up 



-yC e 



dy 



up 



< C- 



eE (e''(*'-i~*LPj)^ I y 
dy, 



df 



dy 



dy 



M 



dl 
dy 
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for some fixed constant C. Define kn — [Kv}/'^''^'^^ \^ by summing up these terms, 
this gives the relation 



(9) E E/bTI'-^Ol 



4A/<riPi,tLpj <L] 



< c 



/ y ^ dy< CK^+'e^^ / : 



df 



dy 



dy. 



Define /o(x,?/) — f{x,y)li^y<:M}i by using a convolution kernel on the variable y, 
there exist sequences and {g~) in C+(R+) converging pointwisely to /o for all 
y M such that gp < fo < ■ See Rudin [2Tj for example. Proposition [T] gives 
that 

E(exp(-7'(ff+))) < liminf E(cxp(-7'„(/o))) 

< limsupE(exp(-7'„(/o))) < E(exp(-7'(gp ))), 

n — *+oc 

and Expression ^ shows that, as p goes to infinity, the left and right hand side 
terms of this relation converge to the Laplace transform of V at /q. Therefore, 
Convergence ([7]) holds at /q. Since 



< 



l{tLpJ<i} 



< P(tLpJ > L)+¥.[{VM)^Vn{h)) l{tL,j<L} 

and the last term being the left hand side of Relation one can choose L and 
M sufficiently large so that this difference is arbitrarily small. The proposition is 
proved. □ 

4. Asymptotic Behavior of the Indices of the First Empty Bins 

It is assumed that a large number n of balls are thrown in the bins according 
to the probability distribution {Pi) defined by Equation ([3]). The purpose of this 
section is to establish limit theorems to describe the limiting distribution of the set 
of indices of bins having a fixed number of balls. 

Theorem 2. The point process of rescaled indices of empty bins associated to the 
probability vector (Pi) when n balls have been used 



i > 1, the ith bin is empty 



ni/(p+2) 

converges in distribution as n goes to infinity to a point process Afoo whose distri- 
bution is given by 

I ' ' rv-i 



(10) 



E 



(^e--^-(9)) =E 



cxp 



(l - e-f(=")) x^+i dx 



for g e Cj5^(]R+). Equivalently (Mn) converges in distribution to the point process 
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where (ti) is a standard Poisson process with parameter [p{p + 2)] ^/(p+^\ 

It can also be shown that the same result holds when the indices of bins con- 
taining k balls are considered. If J^k,n is the corresponding point process, the 
limiting point process does not in fact depend on k and, moreover, the sequence 
{J^k,n, k > 0) converges in distribution to {J^k, ooi k > 0) and, conditionally on Woo, 
the random variables Mk,oo, k>Q are independent with the same distribution. 

Proof. Poissonization. A closely related model is first analyzed when C/„ balls are 
used, Un being an independent Poisson random variable with mean n, Af^ denotes 
the corresponding point process. For this model, conditionally on the sequence 
(Pi), the number of balls in the bins are independent Poisson random variables 
with respective parameters (nPi). In a first step, the convergence in distribution of 
the sequence {Af^J of point processes is investigated. Let g £ C+(M+), 



E 



E exp 



4-00 

E 

.1=1 



log 



e-"P' ( 1 



if one defines f{x, y) = — log [l — e ^ (l — e six)^^ ^ then 

E (exp [-<(.9)]) = E (exp [-VM)]) , 

where Vn is the point process defined in Theorem [TJ By using Proposition [3l one 
gets the relation 



lim 



-<(9) 



E 



= E 



exp 



exp 



P 

p 



(l-e-^'^'^'y'l^xP+^dxdy 



3~9(^) j a-p+i dx 



For < a < I, it is not difficult to check that the same result holds for the case 
when [/„+„Q balls are used, A/'jj denotes the associated point process. For a; > 0, 
the monotonicity property Afa{[0, x]) < Afb{[0, x]) for b < a gives the relation 

P(AA„([0,x]) < k) < P(AA„i([0,a;]) < A:) +P([/„+„c < n). 

The central limit theorem for Poisson processes shows that for a E (1/2,1), the 
quantity P(/7„+„a < n) converges to as n gets large, therefore if fc > 0, 

limsupP(7V„([0, x])<k)< lim P (7V^([0, x]) < k) . 

By using a similar argument with the liminf, one gets that the sequences (A/'„) 
and (A/'^) converge in distribution and have the same limit. The proposition is 
proved. □ 



Corollary 1. Ifvn the index of the first empty bin when n balls are thrown, then 

xP+^W-P^ 



lim I 

n — ^+oo 



V^l/(p+2) 



E exp 



Pip + 2) 



x>0. 
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Comparison with Deterministic Power Law Decay. For S > I, one considers 
the bins and balls problem with the probability vector Q ~ {Qi,i > 1) = (q^/**^)- 
Note that for the problems analyzed in this paper, only the asymptotic behavior of 
the sequence (Qi) matters. The equivalent of Theorem [2] can be obtained directly 
from Theorem 1 of Simatos et al. 1231. 



Proposition 4. As n goes to infinity, the point process 
■i(logn)i/*-i 1 + 5 



■log log n : the ith bin is empty 



converges in distribution to a Poisson point process with the intensity measure 
{ad)^/^e''dx on R. 

The probability vector considered in the above theorem has an asymptotic ex- 
pression of the form (Pi) = (W^Ei/iP^^). In this case, empty bins show up for 
indices of the order of ri^/(''+^), i.e., much earlier than for the deterministic case 
where the exponent of n is 1/6 = l/(p + 1) (if one ignores the log). This can be 
explained simply by the fact that some of the i.i.d. exponential random variables 
(Ei) can be very small thereby creating an additional possibility of having empty 
bins. 

In this picture, the variable Woo does not seem to have an influence on the qual- 
itative behavior of these occupancy schemes other than creating some dependency 
structure for the vector (Pi). The next section shows that this variable has nev- 
ertheless an important role if one looks at the averages of the number of empty 
bins. 

5. Rare Events 

By Equation ([7]) of Theorem [Tl for x > 0, the limiting number (in distribution) 
of empty bins whose index is less than xn^^^P'^'^^ has an average value given by 

j.p+2 p+2 p + co -, 

p(p + 2) ^ °° ' p{p + 2)Jo UP 

by Equation Q and since Woo — exp(— Moo). This quantity is infinite when p > 1. 
The purpose of this section is to investigate this phenomenon which has a significant 
impact on the system at the origin of this model. It is assumed throughout this 
section that p > 1. 

Definition 1. If : N — > R+ is a non-decreasing function, for n > 1, denotes 
the point process defined by 

J^t — { , N : « > 1, the ith bin is empty > . 
I " J 

For i> [pj , the quantity Pi can be written as Pi = cyLp{—ptyp\/)DiZi/iP^^ with 

exp {-p [AU - ^/lpJ - log lp\]) ■ 
The sequence {Di) converges almost surely to a finite limit Dp given by 

(11) Dp exp {-p [Afoo - A/lpJ - log [pj ] ) , 
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and, since exp{pEi/i) is integrable for i > p, a. similar result holds for the expected 
values 

lim E(l/A) = E(l/i?p) < +00. 

i — *+oo 

With this definition, the asymptotic representation of (Pi) can be given as Pi ~ 
exp(-pt LpJ )DpE/iP+^ where E is an independent exponential random variable with 
parameter 1. In a similar way as before, this representation can be shown to be 
valid for the results obtained in this section. 

For < p < 1 and n > 1 , the elementary inequality 

|e-"P _ (1 < —ne-"P < 

' '2 n 

gives directly the following lemma which will be used repeatedly in this section. 
Lemma. For a non- decreasing function 4>, x > 0, and n > 1, then 



[a:0(n)J 

1=1 



< 2e 



2 



When (t>(n) <^ n, this lemma implies that to study the asymptotic behavior of 
(E (A/'jf ([0, x]))), it is enough to analyze the convergence of the corresponding sum 
of the E (e^"'''*). For the moment, /c e N is fixed, if n > 1, i > p, then 

E (e""-^') = E [cxp (-7iL>pe"''*LPj£:/iP+i)] 



= E 



,-p+i 



/n 



by summing up these terms, if Sk,n k/n^/'^'''^^\ one gets that 

,,P+i 



i=LpJ+l 

which gives the relation 



E 



yE(e--P^) =n'^(p+'hl\^ [\( ^ 



dv + (ek^n) , 



dv + (ek^n) , 



with a change of variable. By using Equation ([1]) and again a change of variable, 
one obtains the relation 



(12) 



i)/p 



l/e 



1/^-1(1 -e/„ u^/p)^Pi-^du / E 



„p+i 



vP+^ +uDp 

This quantity is now analyzed according to the values of p. 



dv + O (£fc,„) . 



Case p > 1. 

If fc„=[a;n"J with l/(2p+l) < a < l/(p+l), then£fc„,„ - ^^(a(P+i)-i)/(P+i) and. 
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by Relation 



2—1 



P Jo Jo \vP+'+uDp; 



Case p = 1. 

Equation p2|) is for this case 

J1 /-l/e".,. /-i / „2 



If fc„=[a;ni/^/log''nJ with /3 G M, then efe„^„ - j:/(ni/^(log n)'') and for f3 < 1/3, 



1™ 7^ Vfr^yiE(e-"^') = ia;3E 



1 

(iogn)(i-3/3) ^"^^ ' g'^'^V^ 

The foUowing proposition has therefore been proved. 

Proposition 5 (Average of the Number of Empty Bins). For a, (3 > 0, for n G N, 

denote hy Pa^piji) = n"(logn)~^, and by convention pa =Pa,o- 
(1) // p > 1 and l/{2p +l)<a< l/{p + 1), 

1 



1™ ,K.p+i)-i)/p E(-^^([0^^])) 



P Jo Jo 



vP+^ + uDp 



(2) If p= I and (3 < 1/3, 

lim i^E(AAPV3,^([o,a;])) = ^x^E ( 

n^+oo (fog?i)(l-3/3) 9 



1 „ . 1 . 



A Double Threshold. For the convergence in distribution of the sequence of 
point processes (A/"^), Theorem [2] has shown that the correct scaling (p for the order 
of magnitude of the indices of the first empty bins is given by 4>{ri) — n^/'^P^'^\ 
n > 1. For the average number of points in a finite interval, the above proposition 
states that, for p> 1, the correct scaling is in fact (/)(n) = n^l'^'^P+^) <^ n^/'^P+^\ 

For a > 0, with the notations of the above proposition, one concludes that under 
the condition p > 1 and for l/(2p + 1) < a < l/(p + 2), the following limit results 
hold 

A/'P" and lim E {AfP" [0, x]) = +oo, Vx > 0. 

n — ^ + oo 

This suggests that, in this case, with a high probability, all the bins with index 
less than n^/(''+^^ have a large number of balls. But also that there exists some 
rare event for which a very large number of empty bins with indices of an order 
slightly greater than n^/'^^P+'^'l are created. The following proposition shows that 
the total size of the first \_p\ bins is the key variable to explain this phenomenon. 
It should be of the order of log n in order to have sufficiently many empty bins in 
the appropriate region. 
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Propositions. For p > I and if pa{n) — , for a & [l/{2p + l),l/{p + 2)) and 

. . def. l-aip + 2) def. I - aip+ 1) 

oaict) = and oi(a) = , 

p-l p 

then, for a € R and x > 0, 

(1) If S < 6o{a), then 

^^lim^E(AC^([0,x])l{t^^j<5iog„}) - 0. 

(2) IfSe[So{a),6i{a)[, then 

I^(-^^([0'^])l{tLPj<"°g"+°}) _ XP+^ Ip\ fl \ (p-l)a 

ni^oo „(p+2)a+5(p-l)-l {p + 2) (p - 1) \dJ''' 

(3) IfS>5i{a), 

I pj log n+a} 

= ^(2P+1)/P M U^/P-ldu [\( ^ dv, 

where Dp is the random variable defined by Equation pip. 

Proof. To begin with, it is assumed that (5 G [<^o(a)i If A; > 1, & > 0, e^.n = 

k/n^^^P'^^\ k = [a;n"J and b = S\ogn + a, in the same way as for Equation 
one gets 



x/ u'/p~^(i-e^^^^^u'/p)^Pi-^du e{————) dv + Oisk,n) 

(13) 

Note that 

e~''Vefc;t«^ „l~p5-a(p+l)g-pa 

hence the range of the first integral goes to infinity as n gets large. Since 

/ E — -i ] dv= E — — ; — dv, 

Jo \vP+^+uDp Dp J Jo \{yP+^+uDp)Dp) 

by Lebesgue's Theorem, this integral is arbitrarily small as u gets large, this implies 
the equivalence 

p+i 



VF fp-"-f''-n r ...1^ - p J_\ l/(p+l) (2p+l)/p /■^^^''-•" ,,1/P~2 , 
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If C is the multiplicative constant of the right hand side of the above relation, then 



this gives the equivalence 

k 



i=l P 

The proof of this case is completed. 

The case 5 > (Si (a) uses Equation The term e-P''/el:^^ converges to e 

if (5 = 6i{a) and otherwise. This gives directly the desired convergence. 

Finally, if 5 < 6o{a), for any a S K, there exists no so that if n > no, then 
Slogn < So^a) logn + a, in particular 



E 



(AfP- ( [0, ) 1 { t <5 log „} ) < IE (aAP° ( [0 , x] ) 1 {t <5„ („) log „+a} 



hence 



lmisupE(A/-^^([0,.])l,,^^,,i.,„,) < jr,) (^) 

One concludes by letting a go to — cx3. □ 

As a consequence of the above proposition, for a € [l/(2p + 1), l/{p + 2)), the 
average of the variable Af^" ([0, x]) converges to infinity only when the total size t^pj 
of the first [pj bins is of the order Slogn for a sufficiently large S. The following 
corollary gives a more precise formulation. 

Corollary 2. For p>\ and if Pa{n) = n", for a € [l/(2p + 1), + 2)) 

,5i(a) = (l-a(p+l))/p, 

</ien, /or a, b > 0, 

E f 7VP° ([0, log„-a<tLpj <<5i(q) logn+f,} j 

where, for y, z £ M., tjjdi, z) — (j){y, z) / (j){~oo, +oo) and 

0(2/, .) = x^^P+^yP M f u'/P^idu f E ( . f ^ ) d^. 

A rough (non-rigorous) interpretation of this result could be as follows: on the 
event where "most" (i.e., for the averages) of empty bins are created in the interval 
[0,a;n"], the random variable typ^ — (a) logn converges in distribution to some 
random variable X on M, such that P(X < a) = ■0(— oo, a). 

The following analogous result is proved in a similar way for the critical case 
p=l. 

Proposition 7. For p = 1 and with the notations of the above proposition then, 
forO < P < 1/3, X > 0, and for 0< a < 1/3, 



= —X 



D 
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and for a > 1/3, 

where Di is the random variable defined by Equation (llip . 

6. Generalizations 

The problem analyzed in the present paper can be generalized towards two direc- 
tions. On one hand, the sequence (t„) can stem from a general branching process 
instead of the particular Yule one; on the other hand, the locations of balls can 
have a general distribution. This section discusses these possible extensions. 

Exponential Balls and General Branching Process. Let be the birth 
instants of a general supercritical branching process {Z{t)). See Kingman 13J and 
Nerman [IB] for example. Let a be the Malthusian parameter, and W the almost 
sure limit of (e~"*Z(i)). Under reasonable technical assumptions, Harnqvist [3] has 
shown the following result: 

Theorem 3. Define the point process by 

fc>i 

as t gets large, '^t converges in distribution to a mixed Poisson process whose pa- 
rameter is distributed as "fW for some constant 7 > 0. 

From this result, it is possible to prove that the process {n{tn+k — ^n),^ > 1) 
converges in distribution, as n goes to infinity, to a Poisson process: clearly 

fe>l 

and provided that, up to a multiplicative constant, e"*'=//c converges to W, the 
point process X]fe>i '^n(t„+fc-t„) should converge to a Poisson random variable with 
a deterministic parameter. In this case the probability that a ball falls into the nth 
bin which is given by 

P„ = e-''*"-i(I-e-''(*"-*"-i)), 
has therefore the following asymptotic behavior 

P„ - n~'P^"WP/°'E„ 

where (Ei) are i.i.d. exponential random variables. In the Bellman-Harris case, 
following Athreya and Kaplan [T], it is possible to show that W and (Ei) are 
independent, so that in this case, the asymptotic behavior of (P„) is exactly the 
same as in the case of a Yule process. One can conjecture that this independence 
property still holds in the general case. 

The main obstacle to generalize the results of this paper, even in the Bellman- 
Harris case, is that although W and (Ei) are independent, tn-i and tn — tn-i are 
not independent. In the proof of Proposition 1, this independence plays a crucial 
role, it has therefore to be generalized to variables which are only asymptotically in- 
dependent. Additionally, since the heavy tail property of the limiting variable 
is also true in the general case, see e.g., Liu [15j . a similar rare events phenomenon 
to the one described in Section [5] is plausible in this case. 



OCCUPANCY SCHEMES ASSOCIATED TO YULE PROCESSES 



19 



General Balls and Yule Process. When the underlying branching process is 
changed, the above discussion suggests that the asymptotic behavior of the sequence 
(P„) remains essentially the same as for a Yule process. The situation changes 
significantly when the law of the location X of a ball is changed, in this case with 
the same notations as before for the Yule process, 

Pn = P(t„-1 <X< tn-l + En/n). 

The tail distribution of X then plays a key role. Consider for instance a power law, 
i.e., P(X > x) behaves as 5x~^ for some /? and 5 > 0: then 

and it can be seen that the random variable Woo may not play a role anymore in 
the asymptotic behavior of the system. 
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